# Superscalar processors: example 1

### E. Sanchez, M. Sonza Reorda Politecnico di Torino

Dipartimento di Automatica e Informatica (DAUIN)

Torino - Italy

This work is licensed under the Creative Commons (CC BY-SA) License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/



Let consider a superscalar MIPS64 architecture implementing dynamic scheduling, speculation and multiple issue and composed of the following units:

- An issue unit able to process 2 instructions per clock period; in the case of a branch instruction only one instruction is issued per clock period
- A commit unit able to process 1 instruction per clock period

The following functional units (for each unit the number of clock periods to complete one instruction is reported):

- 1 unit for memory access:1 clock period
- 1 unit for integer arithmetic instructions: 1 clock period
- 1 unit for branch instructions: 1 clock period
- 1 unit for FP multiplication (pipelined): 6 clock periods
- 1 unit for FP division (unpipelined): 8 clock periods
- 1 unit for other FP instructions (pipelined): 2 clock periods
- 1 Common Data Bus.

#### Let also assume that

- Branch predictions are always correct
- All memory accesses never trigger a cache miss.

You should use the following table to describe the behavior of the processor during the execution of the first 2 iterations of a cycle composed of the following instructions, computing the total number of required clock cycles. Registers f11 and f12 store two constants.

| # iteration |                   | ISSUE | EXE | MEM | CDB | COMMIT |
|-------------|-------------------|-------|-----|-----|-----|--------|
| 1           | l.d f1,v1(r1)     |       |     |     |     |        |
| 1           | l.d f2,v2(r1)     |       |     |     |     |        |
| 1           | l.d f3,v3(r1)     |       |     |     |     |        |
| 1           | div.d f5, f3, f11 |       |     |     |     |        |
| 1           | sub.d f4, f1, f2  |       |     |     |     |        |
| 1           | add.d f6, f4, f5  |       |     |     |     |        |
| 1           | div.d f7,f6,f12   |       |     |     |     |        |
| 1           | s.d f7,v4(r1)     |       |     |     |     |        |
| 1           | daddui r1,r1,8    |       |     |     |     |        |
| 1           | daddi r2,r2,-1    |       |     |     |     |        |
| 1           | bnez r2,loop      |       |     |     |     |        |
| 2           | l.d f1,v1(r1)     |       |     |     |     |        |
| 2           | l.d f2,v2(r1)     |       |     |     |     |        |
| 2           | l.d f3,v3(r1)     |       |     |     |     |        |
| 2           | div.d f5, f3, f11 |       |     |     |     |        |
| 2           | sub.d f4, f1, f2  |       |     |     |     |        |
| 2           | add.d f6, f4, f5  |       |     |     |     |        |
| 2           | div.d f7,f6,f12   |       |     |     |     |        |
| 2           | s.d f7,v4(r1)     |       |     |     |     |        |
| 2           | daddui r1,r1,8    |       |     |     |     |        |
| 2           | daddi r2,r2,-1    |       |     |     |     |        |
| 2           | bnez r2,loop      |       |     |     |     |        |

| # iteration |                   | ISSUE | EXE         | MEM | CDB | COMMIT | Notes                          |
|-------------|-------------------|-------|-------------|-----|-----|--------|--------------------------------|
| 1           | l.d f1,v1(r1)     | 1     | 2m          | 3   | 4   | 5      |                                |
| 1           | l.d f2,v2(r1)     | 1     | 3m          | 4   | 5   | 6      |                                |
| 1           | l.d f3,v3(r1)     | 2     | 4m          | 5   | 6   | 7      |                                |
| 1           | div.d f5, f3, f11 | 2     | 7d          |     | 15  | 16     | Wait for f3                    |
| 1           | sub.d f4, f1, f2  | 3     | 6a          |     | 8   | 17     | Wait for f2                    |
| 1           | add.d f6, f4, f5  | 3     | 16a         |     | 18  | 19     | Wait for f5                    |
| 1           | div.d f7,f6,f12   | 4     | 23d         |     | 31  | 32     | Wait for f6, then for div unit |
| 1           | s.d f7,v4(r1)     | 4     | 5m          |     |     | 33     |                                |
| 1           | daddui r1,r1,8    | 5     | 6i          |     | 7   | 34     |                                |
| 1           | daddi r2,r2,-1    | 5     | 7i          |     | 9   | 35     |                                |
| 1           | bnez r2,loop      | 6     | 10j         |     |     | 36     | Wait for r2                    |
| 2           | l.d f1,v1(r1)     | 7     | 8m          | 9   | 10  | 37     |                                |
| 2           | l.d f2,v2(r1)     | 7     | 9m          | 10  | 11  | 38     |                                |
| 2           | l.d f3,v3(r1)     | 8     | 10m         | 11  | 12  | 39     |                                |
| 2           | div.d f5, f3, f11 | 8     | 15d         |     | 23  | 40     | Wait for f3, then for div unit |
| 2           | sub.d f4, f1, f2  | 9     | 12a         |     | 14  | 41     | Wait for f2                    |
| 2           | add.d f6, f4, f5  | 9     | 24a         |     | 26  | 42     | Wait for f5                    |
| 2           | div.d f7,f6,f12   | 10    | 31d         |     | 39  | 43     | Wait for f6, then for div unit |
| 2           | s.d f7,v4(r1)     | 10    | 11m         |     |     | 44     |                                |
| 2           | daddui r1,r1,8    | 11    | 12i         |     | 13  | 45     |                                |
| 2           | daddi r2,r2,-1    | 11    | 13i         |     | 16  | 46     |                                |
| 2           | bnez r2,loop      | 12    | 17 <u>j</u> |     |     | 47     | Wait for r2                    |